Deep Jam: Conversion of Coarse-Grain Parallelism to Fine-Grain and Vector Parallelism
نویسندگان
چکیده
A number of computational applications lack instruction-level parallelism. This loss is particularly acute on sequences of dependent instructions on wide-issue or deeply pipelined architectures. We consider four real applications from computational biology, cryptanalysis, and data compression. These applications are characterized by long sequences of dependent instructions, irregular control-flow and intricate scalar and memory dependence patterns. While these benchmarks exhibit good memory locality and branch-predictability, state-of-the-art compiler optimizations fail to exploit much instruction-level parallelism. This paper shows that major performance gains are possible on such applications, through a loop transformation called deep jam. This transformation reshapes the control-flow of a program to facilitate the extraction of independent computations through classical back-end techniques. Deep jam combines accurate dependence analysis and control speculation, with a generalized form of recursive, multi-variant unroll-and-jam; it brings together independent instructions across irregular control structures, removing memory-based dependences through scalar and array renaming. This optimization contributes to the extraction of fine-grain parallelism in irregular applications. We propose a feedback-directed deep jam algorithm, selecting a jamming strategy, function of the architecture and application characteristics.
منابع مشابه
Deep Jam: Conversion of Coarse-Grain Parallelism to Instruction-Level and Vector Parallelism for Irregular Applications
A number of compute-intensive applications suffer from performance loss due to the lack of instruction-level parallelism in sequences of dependent instructions. This is particularly accurate on wide-issue architectures with large register banks, when the memory hierarchy (locality and bandwidth) is not the dominant bottleneck. We consider two real applications from computational biology and fro...
متن کاملcient Support for Fine - Grain Parallelism onShared - Memory Machines
A coarse-grain parallel program typically has one thread (task) per processor, whereas a ne-grain program has one thread for each independent unit of work. Although there are several advantages to ne-grain parallelism, conventional wisdom is that coarse-grain parallelism is more eecient. This paper illustrates the advantages of ne-grain parallelism and presents an eecient implementation for sha...
متن کاملE cient Support for Fine - Grain Parallelism on Shared
A coarse-grain parallel program typically has one thread (task) per processor, whereas a ne-grain program has one thread for each independent unit of work. Although there are several advantages to ne-grain parallelism, conventional wisdom is that coarse-grain parallelism is more eecient. This paper illustrates the advantages of ne-grain parallelism and presents an eecient implementation for sha...
متن کاملEfficient support for fine-grain parallelism on shared-memory machines
A coarse-grain parallel program typically has one thread (task) per processor, whereas a fine-grain program has one thread for each independent unit of work. Although there are several advantages to fine-grain parallelism, conventional wisdom is that coarse-grain parallelism is more efficient. This paper illustrates the advantages of fine-grain parallelism and presents an efficient implementati...
متن کاملStructure and Performance of Fine-Grain Parallelism in Genetic Search
Within the parallel genetic algorithm framework, there currently exists a growing dichotomy between coarse-pain and fine-grain parallel architectures. This paper attempts to characterize the need for fine-grain parallelism. and to introduce and compare three models of fine-grain parallel genetic algorithms (GAS). The performance of the three models is examined on seventeen test problems and is ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- J. Instruction-Level Parallelism
دوره 9 شماره
صفحات -
تاریخ انتشار 2007